The MITRE Identification Scrubber Toolkit: Design, training, and assessment

نویسندگان

  • John S. Aberdeen
  • Samuel Bayer
  • Reyyan Yeniterzi
  • Ben Wellner
  • Cheryl Clark
  • David A. Hanauer
  • Bradley Malin
  • Lynette Hirschman
چکیده

PURPOSE Medical records must often be stripped of patient identifiers, or de-identified, before being shared. De-identification by humans is time-consuming, and existing software is limited in its generality. The open source MITRE Identification Scrubber Toolkit (MIST) provides an environment to support rapid tailoring of automated de-identification to different document types, using automatically learned classifiers to de-identify and protect sensitive information. METHODS MIST was evaluated with four classes of patient records from the Vanderbilt University Medical Center: discharge summaries, laboratory reports, letters, and order summaries. We trained and tested MIST on each class of record separately, as well as on pooled sets of records. We measured precision, recall, F-measure and accuracy at the word level for the detection of patient identifiers as designated by the HIPAA Safe Harbor Rule. RESULTS MIST was applied to medical records that differed in the amounts and types of protected health information (PHI): lab reports contained only two types of PHI (dates, names) compared to discharge summaries, which were much richer. Performance of the de-identification tool depended on record class; F-measure results were 0.996 for order summaries, 0.996 for discharge summaries, 0.943 for letters and 0.934 for laboratory reports. Experiments suggest the tool requires several hundred training exemplars to reach an F-measure of at least 0.9. CONCLUSIONS The MIST toolkit makes possible the rapid tailoring of automated de-identification to particular document types and supports the transition of the de-identification software to medical end users, avoiding the need for developers to have access to original medical records. We are making the MIST toolkit available under an open source license to encourage its application to diverse data sets at multiple institutions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and evaluation of local ventilation system and packed bed scrubber and control of hydrogen sulfide emitted from the dryer lines of a Pulp Industry

Abstract Introduction: The diffusion of pollutants Methyl Mercaptan, Hydrogen Sulfide, Dimethyl sulfide and Dimethyl disulfide from dryer machines of pulp industries will cause to smell annoyance and health effects on the employees and neighbors . This study aimed to determine the effectiveness of local ventilation system and packed bed scrubber implemented for controlling the emitted hydrogen...

متن کامل

Identification and assessment of training needs for employees of wind farms

In this paper, the training needs of wind farm employees have been specified, assessed and prioritized. For this purpose, first of all, the main tasks of wind farm employees have been identified. Afterwards, four criteria—including task complexity, task importance, task time duration and task frequency—have been considered to assess and prioritize tasks. In this respect, the Analytic Hierar...

متن کامل

Transport and Application Protocol Scrubbing

This paper describes the design and implementation of a protocol scrubber, a transparent interposition mechanism for explicitly removing network attacks at both the transport and application protocol layers. The transport scrubber supports downstream passive network-based intrusion detection systems; whereas the application scrubbing mechanism supports transparent fail-closed active network-bas...

متن کامل

The Challenges of Creating a Gold Standard for De-identification Research

We created a Gold Standard corpus comprised over 20,000 records of annotated narrative clinical reports for use in the training and evaluation of NLM Scrubber, a de-identification software system for medical records. Our experience with designing the corpus demonstrated the conceptual complexity of the task.

متن کامل

Translation, Adaptation and Validation of Referral Systems Assessment and Monitoring Toolkit for the Family Physicians Program in Iran

Background and purpose: Studies on the function of referral system in Iran had not covered all aspects and structures of the referral system. This could be due to lack of an appropriate tool that could investigate referral system in Iran. The current study was done to translate and investigate the validation of Referral Systems Assessment and Monitoring (RSAM) Toolkit based on family physician ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • International journal of medical informatics

دوره 79 12  شماره 

صفحات  -

تاریخ انتشار 2010